Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create Docker builds and add docker-compose config for self-hosting #280

Open
wants to merge 48 commits into
base: dev
Choose a base branch
from

Conversation

jshimko
Copy link

@jshimko jshimko commented Sep 26, 2024

By popular demand, I bring you... Docker builds! This PR adds support for building/starting the entire project with a single command and provides a base docker-compose.yml that can be used as a reference for how to self-host Stack using Docker builds instead of Vercel.

TL;DR

To test this out, you can clone my fork and just run the new Docker startup command...

git clone --single-branch -b docker-builds https://github.com/jshimko/stack.git stack-docker
cd stack-docker

pnpm docker:up

# which is just a convenient alias for...
# docker compose up -d && docker compose logs -f dashboard backend

When running this for the first time, Docker Compose will build the dashboard and backend Docker images and then start them both once complete. It will then tail the logs of both so you can watch for any issues or watch request logs. Once the containers are all started, you should now be able to see the dashboard at http://localhost:8101 and should be able to sign in. That's it! The database is already migrated and a new "self-host" seed script has been run inside the container to ensure the required starting data exists.

Details

The first time you run pnpm docker:up will build the Docker images, but if you change code and want to create a new build, you will have to explicitly run pnpm docker:build and then pnpm docker:up again. There's also a pnpm docker:reset command will kills all of the containers and deletes all data volumes so you can quickly start over from scratch with fresh databases/services. When the db is empty on first start, the backend will automatically migrate and re-seed the database again. And if there are new migrations in a build, the backend will apply them at startup. See the root package.json for more details on these new commands.

Most of the Dockerfile and docker-compose stuff is fairly self-explanatory and I tried to include plenty of comments for anyone that might be digging into those bits. Other than that, I just had to make a few small tweaks to code to be able to support build/deployment in Docker rather than Vercel. The biggest one would be the support for runtime environment variables on the client side.

NextJS runtime env

As you probably know, NextJS compiles all NEXT_PULIC_ env vars into static values when running next build. This isn't an issue on Vercel, but it's definitely an issue in a Docker build because it would mean that you need to hard code runtime values like URL's or analytics API keys into your Docker images and that means you'd have to create a new build for every URL change even though the code hasn't changed. Fortunately, you can use a package called next-runtime-env to fix this problem (see README for detailed explanation). The only changes required to adopt that package are to replace every process.env.NEXT_PUBLIC_ value in the code with that package's env('NEXT_PUBLIC_WHATEVER') helper. That ensures the values set at runtime are always up to date and aren't hard coded in the build output. That said, I replaced every process.env.NEXT_PUBLIC_X variable with env('NEXT_PUBLIC_X') in the dashboard and backend (mostly the dashboard).

Related to all of that, I also realized that the @stackframe/stack package was using several process.env.NEXT_PUBLIC_X values as well, so that meant we'd need the same thing there. However, I didn't want to force next-runtime-env to be a dependency of that package. I also didn't want to create any conflicts for customers that may already be using next-runtime-env because that package writes the NEXT_PUBLIC_ values to window.__ENV in the browser and having two instances of that could potentially result in stepping on each others' toes when writing values to the window object. To ensure that's not a possibility, I created a simple helper component that does the exact same thing as next-runtime-env except writes internal Stack vars to window.__STACK_ENV__. You can find that in packages/stack/src/lib/env. So it solves the same problem in a couple dozen lines of code with no third party dependency and removes the possibility of conflicts when people are already using next-runtime-env.

New NEXT_PUBLIC_INSECURE_COOKIE config

Since Docker builds result in a Node app running with NODE_ENV=production, that meant cookies were expected to be coming from a URL with https://. That meant that running the builds with docker-compose on localhost wasn't able to work and resulted in an infinite loop of redirects in the browser. To work around that limitation, I added a new env var called NEXT_PUBLIC_INSECURE_COOKIE that allows you to use the dashboard on localhost when NODE_ENV=production. I added detailed comments about this in the code and noted that it should NEVER be used in production. See packages/stack/src/lib/cookie.ts for details. (Addressed outside of this PR)

Self host seed script

Instead of messing with the existing seed script that is used for local dev, I created a new one that is intended to be used inside the backend Docker container on first startup. It's pretty self-explanatory if you read through it. The main difference from the original seed script is it allows you to pass a few new env vars that can configure a default admin user and optionally disable signups to the internal project. Both the admin user and the signup disable are optional and both default to being skipped, so the default behavior is you sign up to create your initial account just like the original seed script. The downside of that though is you won't have access to the internal Stack Dashboard project once in the dashboard. You'll be a member of it, but you won't be able to manage it or add other users. The new seed script admin user does get access to that project, so that's probably what most people will prefer when self-hosting.

You can find these new env options in the new .env file at apps/backend/.env.docker. Note that the new docker-compose.yml loads that file automatically. Also note that I added a apps/dashboard/.env.docker file as well so that Docker deployments and local dev have their own distinct configs. This is important for several reasons. First, because the old docker-compose config used for local dev is still in place so that pnpm dev functions the same way it already did. That means the new docker-compose config has it's own databases, etc. and the URL's and db ports are different to avoid any conflicts or confusion. Second, you can't set URL env vars to http://localhost:PORT inside a Docker container because localhost is no longer your machine in that context. It's the container itself. So that means you need a URL that resolves to the right place from within the container, but also resolves to localhost outside the container. Fortunately, Docker has a solution for this with host.docker.internal. In short, any URL's that have localhost in them in local dev needed to be converted to host.docker.internal when running in a container. See Docker's workaround docs for that here. That is enabled in the docker-compose config by these lines on the backend and dashboard...

  backend:
    ...
    extra_hosts:
      - "host.docker.internal:host-gateway"

Docker should already have taken care of this hostname resolution for you when it was installed, but just in case it didn't, you can add the following to your /etc/hosts file...

127.0.0.1 host.docker.internal

CI Docker Builds

Lastly, I added a Github workflow to build and publish the new Docker builds. All that is required to get them working is to provide 3 secrets in your repo or org configs.

DOCKER_REPO
DOCKER_USER
DOCKER_PASSWORD

So, for example, if you create a stack-auth org on Docker Hub, DOCKER_REPO would just be your org name of stack-auth and the user/password values could be for any user that has push access to your account.

As for the build tags that are created by this workflow, there are 3 different potential tag formats. Any time you merge to dev or main, the workflow will build and push two tags - one is the short SHA from the commit (first 7 chars) and the other is the branch name. So a commit to dev would look like this:

# assuming you go with the `stack-auth` org on Docker Hub

stack-auth/stack-dashboard:abc1234
stack-auth/stack-dashboard:dev

stack-auth/stack-backend:abc1234
stack-auth/stack-backend:dev

In this case, the :dev branch tag will always be an alias for the latest build on the dev branch while the :abc1234 tag is the specific commit hash. This allows users to just pull the "latest" of dev or pin to a specific commit. The main branch builds work the same way.

I also configured it to build when tagged with a version number. I know you don't currently use tags on your releases, but I think it'd be really helpful if you did so it's clear when package versions have actually changed. But also because that triggers a special tag format in the docker/metadata-action that automates the tagging. When you tag a commit with a version number in the format of 1.2.3, that tells the metadata action that this is an official release and the resulting Docker builds that get pushed will be in the format:

stack-auth/stack-dashboard:1.2.3
stack-auth/stack-dashboard:latest

stack-auth/stack-backend:1.2.3
stack-auth/stack-backend:latest

That allows users to pick a specific production release or just always pull the "latest" stable prod release.

docker pull stack-auth/stack-dashboard:latest
docker pull stack-auth/stack-backend:latest

Misc

I also updated Prisma to the latest 5.20.0 release. I know that probably seems unrelated, but this was because you were previously on a fairly old version and that version didn't have a Prisma binary available for linux/arm64 architecture. The reason this mattered was anyone trying to build on a new M series Macbook would get an error when running pnpm install inside the Debian container that these Docker images are based on. Updating to the a more recent Prisma version solved this.

Ok, I think that's everything. I've been using all of this in our own Kubernetes deployments for the last couple weeks while iterating on things and everything is really stable for me, so I think I smoothed out all the rough edges. Let me know if you have any questions of if there's anything else I can do!

* dev:
  fixed <p> in <p> problem in extra info
  added password update
  fixed verification code, added tests (stack-auth#259)
  Fix team invitation docs
  remove slack oauth, allow no email in oauth
  feat: Add twitter oauth provider (stack-auth#206)
  redirect to team page after team creation
  fixed account-setting mobile style
  chore: update package versions
  fixed delete client error
  fix(typo): Remove the t('') wrapping the "Click here" (stack-auth#256)
  removed deprecated code
* dev:
  fixed sidebar layout style
  fixed team invitation detail docs
  added team metadata to the client library
  fixed docs tag
  fixed sign in with XYZ button translation
  Added handling for user canceling the oauth process (stack-auth#260)
  fixed account-setting styling issues
* dev:
  fixed self-host docs
* dev:
  fixed visual bugs
  userIdOrMe support on all yup validations
  fixed env vars

# Conflicts:
#	apps/dashboard/src/app/layout.tsx
* dev:
  Fixed yup union error message (stack-auth#278)
  Made password repeat on sign up configurable (stack-auth#273)
* dev:
  chore: update package versions
Copy link

vercel bot commented Sep 26, 2024

@jshimko is attempting to deploy a commit to the Stack Team on Vercel.

A member of the Team first needs to authorize it.

* dev:
  Update README.md
  fixed translation
  added list user tests
  fixed email update (stack-auth#284)
  feat: add swap order option (stack-auth#283)
@csyedbilal csyedbilal mentioned this pull request Oct 1, 2024
* dev:
  New contact channels (stack-auth#287)
  Fix team creation on the server not automatically adding the current user (stack-auth#266)
  chore: update package versions
  fixed current user docs
  fixed maybeFullPage layout
* dev:
  Update README.md
  Update README.md
  Update README.md
* dev:
  fix: user should select at least one provider before creating project (stack-auth#285)
  Update CONTRIBUTING.md
  fixed typo
* dev:
  chore: update package versions
  updated translations
  feat: show error message when no auth method enabled (stack-auth#282)
  added success and destructive variants in toast file for colorful toasts (stack-auth#291)
@N2D4
Copy link
Contributor

N2D4 commented Oct 8, 2024

This is excellent, thank you so much for your contribution!

From a first glance, most of this seems great. My main concern is about the handling of the environment variables; this PR (and next-env-runtime) uses noStore, which disables static site generation and partial pre-rendering, both for ourselves and our customers who use the @stackframe/stack package.

I would actually argue that requiring a Docker image rebuild when updating envvars is by design; this way, the statically generated files (and with those the initial Next.js response) can contain as much information as possible. This means we can't easily publish a Docker image to a registry, though. IMO this is an acceptable tradeoff; if you're self-hosting, you're setting yourself up to struggle with much harder things than just rebuilding a Docker image from scratch. (DB migrations, for example.)

What do you think?

@jshimko
Copy link
Author

jshimko commented Oct 8, 2024

Thanks for the review @N2D4! So, a couple thoughts...

First, noStore only disables static rendering for the component that it is used in. A few lines from their docs...

unstable_noStore can be used to declaratively opt out of static rendering and indicate a particular component should not be cached.

unstable_noStore is preferred over export const dynamic = 'force-dynamic' as it is more granular and can be used on a per-component basis.

In this particular case, we're only effecting a single NextJS <Script/> component. Other components outside of that should render the same as previously.

As for hard coding environment variables in a Docker builds, that is very much against best practices for a variety of reasons, but particularly because it breaks the best practice of Docker builds always being stateless (also long considered a best practice for software in general by the classic "12 Factor App" methodology). Docker even mentions that in their best practices for building under the section "Create ephemeral containers". Which also links directly to the 12 Factor site...

Refer to Processes under The Twelve-factor App methodology to get a feel for the motivations of running containers in such a stateless fashion.

See also Factor 3 - Config: https://12factor.net/config

Requiring every user to build their own images just to set a dynamic URL also means nobody can even use the same build between dev, staging, production, etc. And the only reason for that is because the URL changes between those envs. Supporting runtime config entirely solves that. If I deploy a build for my staging environment and thoroughly test everything, I need to be able to use that same build again when I promote it to production. Otherwise a new Docker build doesn't guarantee everything is 100% the same. At the very least, it's unnecessary overhead to create a duplicate build that only changes a few environment variables.

Perhaps more importantly, hard coding env into Docker builds means that nobody can ever publish reusable Docker builds. That one is kind of a non-starter for us. For example, all of our Kubernetes deployments (dev/staging/production) are completely automated and the release process goes from 1) development (latest of main branch) to 2) staging (a commit tagged for release) to 3) production (same tagged release promoted from staging) and the same Docker builds are used from end to end. Having to build and publish a new image for every deployment adds a lot of opportunities for build inconsistencies, needless CI/CD complexity, and tons of Docker image storage that would all otherwise be avoided by supporting runtime configs in a single reusable build.

Lastly, having Docker builds be reusable means that you (Stack Auth) can publish "official" builds (using the Github workflow I added in this PR). Since most self-host users aren't modifying the code, most won't even need to bother with the build step. There's a big advantage to having a single official source of production builds that everyone uses. This completely removes the "works with my build" debugging headache that will inevitably turn into a time consuming community support nightmare. With official builds, the only thing that differs between users is the config they pass in. That greatly reduces the amount of things that could be preventing a deployment from working correctly. If it works for one properly configured deployment, it should work for every properly configured deployment because you can be sure it is 100% the same code and build output. That literally allows you point to working configs in the docs and just say "works on my machine"! :)

Unrelated, just wanted to clarify on this comment about migrations...

IMO this is an acceptable tradeoff; if you're self-hosting, you're setting yourself up to struggle with much harder things than just rebuilding a Docker image from scratch. (DB migrations, for example.)

Migrations are actually automated in the Docker builds. They run on backend container startup in the entrypoint script (which can be disabled with the STACK_SKIP_MIGRATIONS env var if needed). So any time a new migration is released, it will automatically apply when the new backend build is deployed. I'd argue this is even easier than deploying on Vercel where you need to manually apply migrations to your database and try to time it with the code release that depends on it. Not running on startup also assumes all self-host users will even be aware that a new Stack release has new migrations in it. That's why automation is key here. As you know, deploying code that expects migrations to have been run can very easily lead to production downtime when Prisma falls over due to schema mismatches. Automating migrations ensures the app can't even start up until the new migrations have been successfully applied. Also, manually doing stuff to a production database is no fun!

So, not sure if I made a convincing case here, but happy to answer any questions or clarify anything further if you're still concerned about these changes. Let me know what you think!

* dev:
  chore: update package versions
  Project specific JWKs (stack-auth#293)
@N2D4
Copy link
Contributor

N2D4 commented Oct 8, 2024

In this particular case, we're only effecting a single NextJS <Script/> component. Other components outside of that should render the same as previously.

What this means in practice is that this <Script /> component will suspend, which will pause the rendering of all components up to the closest Suspense boundary. Since <StackProvider /> is at the very top of the layout.tsx, this would be the entire page. We could wrap <StackProvider /> in a Suspense boundary, but in that case the <Script /> would not be the first script that runs on the page, and window.__STACK_ENV__ will not be available when the statically rendered components are hydrated in the browser.

Migrations are actually automated in the Docker builds. They run on backend container startup in the entrypoint script (which can be disabled with the STACK_SKIP_MIGRATIONS env var if needed). So any time a new migration is released, it will automatically apply when the new backend build is deployed. [...]

That only works if you don't worry about downtime during migrations. Think of the following scenario when renaming a column from A to B:

  • DB v1, server v1: The initial version where the col is named A.
  • DB v2, server v1: We add a new col named B.
  • DB v2, server v2: We update the server to write to both A and B, but it still reads from A.
  • DB v3, server v2: We do a database migration where we copy A to B.
  • DB v3, server v3: We update the server to write and read from B only.
  • DB v4, server v3: We delete A.

You need to coordinate the server and DB updates when migrating, but if you don't and you do all the updates at the same time (particularly if you're running multiple revisions of the server at the same time as part of your autoscaling/rollout), things will break. This is acceptable if you're fine with a few minutes of downtime, but not otherwise.


I get your point regarding configless container builds. Cal.com has a similar setup and they publish a Docker image that's designed for local usage, alongside a build script to customize it for production; we could have something like that. The other alternative is that we disable static rendering/PPR on the Docker version, and essentially do what you did here, while keeping it in the main deployment. I'll think about this a bit today.

* dev: (66 commits)
  Make doctoc update all files (stack-auth#311)
  Add doctoc to CONTRIBUTING.md
  OTP auth on the client SDK and dashboard (stack-auth#309)
  Revert "added filtering params"
  added filtering params
  added jwt tests again
  Disable Prettier in VSCode settings
  Remove unnecessary envvar
  TOTP retry
  Update README
  Update README
  Update README
  chore(docs): update TOC
  Add port mapping to README
  Use `pnpm run build:dev` in setup script
  SDK classes/hooks reference docs (stack-auth#301)
  Contact channel client (stack-auth#290)
  chore: update package versions
  fixed account settings bugs
  chore: update package versions
  ...

# Conflicts:
#	apps/backend/package.json
#	apps/dashboard/package.json
#	packages/stack/src/lib/cookie.ts
#	packages/stack/src/lib/stack-app.ts
#	pnpm-lock.yaml
@N2D4
Copy link
Contributor

N2D4 commented Oct 29, 2024

I talked to the Next.js team and they have an --experimental-build-mode CLI flag that we should be able to use to not inline variables. This would do what we want, though I'm not sure if the behavior has already been released with Next.js 15 or not — at least it's a new avenue though.

@dbjpanda
Copy link

@jshimko @N2D4 Great work happening here. Would love to try it out soon.

@fomalhautb
Copy link
Contributor

@jshimko Can you give us the permission to edit this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants